👋 Welcome! A tech blog where a backend developer shares experiences in Backend, Infrastructure, Troubleshooting, and more ✨
Kubernetes Configuration

Kubernetes Configuration

Cluster Setup

Kubernetes clusters can be configured in various environments:

  • GKE (Google Kubernetes Engine)
  • EKS (Amazon Elastic Kubernetes Service)
  • AKS (Azure Kubernetes Service)
  • On-Premises (kubeadm, Rancher, etc.)
  • Local Development (Minikube, Kind, k3s)

Cluster Configuration by Environment

development  → 2 nodes  (2 CPU, 8GB RAM per node)
staging      → 3 nodes  (4 CPU, 16GB RAM per node)
production   → 5+ nodes (8 CPU, 32GB RAM per node)

Node count and specs should be adjusted based on application requirements.

Namespace Structure

In microservice architecture, it’s common to separate namespaces by service.

1
2
3
4
5
6
7
# Namespace configuration example
- default              # Default namespace
- service-a            # Microservice A
- service-b            # Microservice B
- infrastructure       # Common infrastructure like Redis, Kafka
- monitoring           # Prometheus, Grafana
- logging              # ELK Stack

Check namespaces:

1
kubectl get namespaces

Deployment Manifests

Basic Deployment

Basic Deployment configuration example for Spring Boot applications.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
  namespace: my-namespace
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-service
  template:
    metadata:
      labels:
        app: my-service
        version: v1.0.0
    spec:
      containers:
      - name: my-service
        image: your-registry/my-service:v1.0.0
        ports:
        - containerPort: 8080
        env:
        - name: SPRING_PROFILES_ACTIVE
          value: "production"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: my-service-secret
              key: database-url
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 2Gi
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
        startupProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 24

Service

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
  name: my-service
  namespace: my-namespace
spec:
  selector:
    app: my-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080
  type: ClusterIP

ConfigMap

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: v1
kind: ConfigMap
metadata:
  name: my-service-config
  namespace: my-namespace
data:
  application.yaml: |
    server:
      port: 8080
    spring:
      datasource:
        hikari:
          maximum-pool-size: 10
      jpa:
        hibernate:
          ddl-auto: validate

Secret

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Secret
metadata:
  name: my-service-secret
  namespace: my-namespace
type: Opaque
stringData:
  database-url: jdbc:postgresql://postgres:5432/mydb
  database-username: admin
  database-password: supersecret

HPA (Horizontal Pod Autoscaler)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-service-hpa
  namespace: my-namespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Ingress

Ingress configuration example for accessing internal cluster services from outside.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: my-namespace
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /api/v1
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

Essential Commands

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Check pods
kubectl get pods -n my-namespace

# Check pod logs
kubectl logs -f POD_NAME -n my-namespace

# Restart pods
kubectl rollout restart deployment/my-service -n my-namespace

# Apply configuration
kubectl apply -f deployment.yaml

# Access pod
kubectl exec -it POD_NAME -n my-namespace -- /bin/bash

# Check resource usage
kubectl top pods -n my-namespace
kubectl top nodes

# Check events
kubectl get events -n my-namespace --sort-by='.lastTimestamp'

Deployment Strategies

Rolling Update (Default)

1
2
3
4
5
6
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Number of additional Pods that can be created simultaneously
      maxUnavailable: 0  # Number of Pods that can be deleted simultaneously

Blue-Green Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Deploy Blue version (existing)
kubectl apply -f blue-deployment.yaml

# Deploy Green version (new)
kubectl apply -f green-deployment.yaml

# Switch Service to Green
kubectl patch service my-service -p '{"spec":{"selector":{"version":"green"}}}'

# Rollback to Blue if issues occur
kubectl patch service my-service -p '{"spec":{"selector":{"version":"blue"}}}'

JVM Warm-up Configuration

Configuration to resolve cold start issues:

1
2
3
4
5
6
7
startupProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 24  # Wait up to 2 minutes

For more details, see the blog post.

Resource Optimization

CPU/Memory Requests vs Limits

1
2
3
4
5
6
7
resources:
  requests:
    cpu: 500m      # Minimum guaranteed resources
    memory: 1Gi
  limits:
    cpu: 2000m     # Maximum available resources
    memory: 2Gi

Recommendations:

  • requests should be 80-90% of actual usage
  • limits should be based on peak usage
  • CPU limits should be set carefully as they can cause throttling

Troubleshooting

Pod Won’t Start

1
2
3
4
5
6
7
8
# Check pod status
kubectl describe pod POD_NAME -n my-namespace

# Check events
kubectl get events -n my-namespace

# Check logs
kubectl logs POD_NAME -n my-namespace

OOMKilled Occurs

When Pod is terminated due to memory shortage:

1
2
3
4
5
6
7
8
# Check memory usage
kubectl top pod POD_NAME -n my-namespace

# Increase Deployment memory
kubectl set resources deployment/my-service \
  --limits=memory=4Gi \
  --requests=memory=2Gi \
  -n my-namespace

Security

1. RBAC (Role-Based Access Control)

Apply principle of least privilege:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# ❌ BAD: Too broad permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: developer
rules:
- apiGroups: ["*"]
  resources: ["*"]
  verbs: ["*"]  # Grant all permissions

---
# ✅ GOOD: Grant only minimum necessary permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: my-namespace
rules:
# Pod read-only
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list", "watch"]

# Deployment view only
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch"]

# Service view only
- apiGroups: [""]
  resources: ["services"]
  verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: my-namespace
subjects:
- kind: User
  name: developer@example.com
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

2. Pod Security Standards

Apply Pod security policies:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Apply Pod Security Standards at namespace level
apiVersion: v1
kind: Namespace
metadata:
  name: my-namespace
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

---
# ✅ GOOD: Security-hardened Pod configuration
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
  namespace: my-namespace
spec:
  template:
    spec:
      # Enforce non-root user
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 3000
        fsGroup: 2000
        seccompProfile:
          type: RuntimeDefault

      containers:
      - name: my-service
        image: my-service:v1.0.0
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
          runAsNonRoot: true
          runAsUser: 1000

        # Temporary volumes needed when using read-only root filesystem
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache

      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}

3. Secret Encryption and Management

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# ❌ BAD: Storing secrets in plaintext
apiVersion: v1
kind: Secret
metadata:
  name: db-secret
type: Opaque
data:
  password: bXlwYXNzd29yZDEyMw==  # base64 is not encryption!

---
# ✅ GOOD: Using External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-secret
  namespace: my-namespace
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: db-secret
    creationPolicy: Owner
  data:
  - secretKey: password
    remoteRef:
      key: database/prod/password
  - secretKey: username
    remoteRef:
      key: database/prod/username

4. Network Policy

Default deny + explicit allow:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# 1. Deny all traffic by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: my-namespace
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
# 2. Explicitly allow only necessary traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: my-service-netpol
  namespace: my-namespace
spec:
  podSelector:
    matchLabels:
      app: my-service
  policyTypes:
  - Ingress
  - Egress

  # Ingress: Allow access only from API Gateway
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: api-gateway
      podSelector:
        matchLabels:
          app: gateway
    ports:
    - protocol: TCP
      port: 8080

  # Egress: Allow outgoing traffic only to necessary services
  egress:
  # Allow DNS
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

  # Allow Database access
  - to:
    - namespaceSelector:
        matchLabels:
          name: database
      podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432

5. Image Pull Policy and Verification

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# ✅ GOOD: Image security configuration
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      # Allow only specific registry
      imagePullSecrets:
      - name: my-registry-secret

      containers:
      - name: my-service
        # Use SHA256 digest instead of tag (guarantees immutability)
        image: gcr.io/my-project/my-service@sha256:abc123...
        imagePullPolicy: Always  # Always verify latest image

        # Don't inject sensitive info via environment variables
        env:
        - name: DATABASE_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: password

Performance

1. Resource Requests/Limits Optimization

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# ❌ BAD: No resource settings (risk of node resource exhaustion)
spec:
  containers:
  - name: my-service
    image: my-service:v1.0.0

---
# ❌ BAD: Too large gap between Request and Limit
spec:
  containers:
  - name: my-service
    image: my-service:v1.0.0
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 8000m      # 80x difference
        memory: 16Gi    # 128x difference

---
# ✅ GOOD: Appropriate Resource settings
spec:
  containers:
  - name: my-service
    image: my-service:v1.0.0
    resources:
      requests:
        cpu: 500m        # Minimum guaranteed CPU
        memory: 1Gi      # Minimum guaranteed memory
      limits:
        cpu: 2000m       # Maximum 4x (allow burst)
        memory: 2Gi      # Maximum 2x

# Recommended ratios:
# - CPU: 2-4x of requests
# - Memory: 1.5-2x of requests (prevent OOM)

2. HPA (Horizontal Pod Autoscaler) Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# ✅ GOOD: Auto scaling based on CPU and memory
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-service-hpa
  namespace: my-namespace
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service

  minReplicas: 3    # Minimum Pod count (high availability)
  maxReplicas: 10   # Maximum Pod count (cost control)

  metrics:
  # CPU-based scaling
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale out when CPU reaches 70%

  # Memory-based scaling
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale out when Memory reaches 80%

  # Custom metric (RPS)
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"

  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Stabilize for 5 min before scaling down
      policies:
      - type: Percent
        value: 50  # Reduce by 50% at a time
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0  # Scale up immediately
      policies:
      - type: Percent
        value: 100  # Expand up to 100% at once
        periodSeconds: 15
      - type: Pods
        value: 4  # Or add maximum 4 Pods at once
        periodSeconds: 15
      selectPolicy: Max  # Select more aggressive policy

3. Node Affinity and Pod Topology

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# ✅ GOOD: Performance optimization with Node Affinity
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      # Deploy only to specific node groups
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              # Deploy only to high-performance nodes
              - key: node-type
                operator: In
                values:
                - high-performance
              # Deploy only to specific zones (data locality)
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - asia-northeast3-a
                - asia-northeast3-b

        # Pod Anti-Affinity: Prevent deployment on same node (high availability)
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: my-service
              topologyKey: kubernetes.io/hostname

      # Topology Spread Constraints: Even distribution across zones
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: my-service

4. Efficient Probe Configuration

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
spec:
  containers:
  - name: my-service
    # ✅ GOOD: Appropriate Probe settings
    livenessProbe:
      httpGet:
        path: /actuator/health/liveness
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3        # Restart after 30 seconds

    readinessProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5            # Check more frequently
      timeoutSeconds: 3
      successThreshold: 1         # Forward traffic after 1 success
      failureThreshold: 2         # Block traffic after 10 seconds

    startupProbe:
      httpGet:
        path: /actuator/health/readiness
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      failureThreshold: 30        # Wait up to 150 seconds (allow slow start)

Reliability

1. PDB (Pod Disruption Budget)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# ✅ GOOD: Guarantee minimum availability during voluntary disruptions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-service-pdb
  namespace: my-namespace
spec:
  minAvailable: 2  # Always keep at least 2 Pods available
  # Or
  # maxUnavailable: 1  # Allow maximum 1 Pod disruption
  selector:
    matchLabels:
      app: my-service

2. Rolling Update Strategy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# ✅ GOOD: Progressive rollout
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # Can create 1 additional Pod
      maxUnavailable: 0   # No Pod disruption (zero-downtime deployment)

  minReadySeconds: 10     # Wait 10 seconds before updating next Pod
  progressDeadlineSeconds: 600  # Fail if not completed within 10 minutes

  revisionHistoryLimit: 5 # Keep last 5 versions (for rollback)

3. Monitoring and Alerts

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-service
  namespace: my-namespace
spec:
  selector:
    matchLabels:
      app: my-service
  endpoints:
  - port: metrics
    interval: 30s
    path: /actuator/prometheus

---
# PrometheusRule: Alert rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: my-service-alerts
  namespace: my-namespace
spec:
  groups:
  - name: my-service
    interval: 30s
    rules:
    # Alert when Pod count is below 2
    - alert: LowPodCount
      expr: count(kube_pod_status_phase{namespace="my-namespace",phase="Running"}) < 2
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod count is below minimum"

    # Alert when error rate is above 5%
    - alert: HighErrorRate
      expr: |
        sum(rate(http_requests_total{status=~"5..",namespace="my-namespace"}[5m])) /
        sum(rate(http_requests_total{namespace="my-namespace"}[5m])) > 0.05
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Error rate is above 5%"

Checklist

Security

  • Are minimum necessary permissions granted with RBAC?
  • Are Pod Security Standards applied?
  • Are Secrets managed with External Secret Manager?
  • Is traffic restricted with Network Policy?
  • Are containers running as non-root user?
  • Are image digests being used?

Performance

  • Are Resource Requests/Limits properly configured?
  • Is HPA configured?
  • Is Node Affinity properly configured?
  • Are Probe settings optimized?

Reliability

  • Is PDB configured?
  • Is Rolling Update strategy appropriate?
  • Are there at least 3 replicas?
  • Are monitoring and alerts configured?
  • Is Revision History maintained?