AI API Proxy Kubernetes Deployment Guide

Complete production-ready guide to deploying AI API proxies on Kubernetes. Includes Helm charts, HPA autoscaling, service mesh integration, and 2026 best practices for high-availability enterprise deployments.

Production Kubernetes Architecture

A well-architected Kubernetes deployment ensures scalability, reliability, and maintainability for your AI API proxy infrastructure.

Ingress Controller

NGINX or Traefik routing traffic to API services with SSL termination and rate limiting.

Service Mesh (Istio/Linkerd)

Advanced traffic management, observability, and security policies across all services.

Horizontal Pod Autoscaler

Dynamic scaling based on CPU, memory, or custom metrics to handle varying API loads.

StatefulSets + Persistent Storage

Reliable state management for API rate limiting, caching, and session data.

Step-by-Step Production Deployment

1. Helm Chart Configuration

Create a production-ready Helm chart for your AI API proxy:

values.yaml

# AI API Proxy Helm Values
replicaCount: 3
image:
  repository: ai-proxy/api-gateway
  tag: v2.6.0
  pullPolicy: IfNotPresent

resources:
  limits:
    cpu: "1000m"
    memory: "2Gi"
  requests:
    cpu: "200m"
    memory: "512Mi"

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilization: 70
  targetMemoryUtilization: 80

service:
  type: ClusterIP
  port: 8080

ingress:
  enabled: true
  className: "nginx"
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix

config:
  openai:
    apiKeySecret: openai-api-key
    rateLimit: 5000
  anthropic:
    apiKeySecret: anthropic-api-key
    rateLimit: 3000

2. Kubernetes Manifests

Deployment manifest with liveness and readiness probes:

deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-api-proxy
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-api-proxy
  template:
    metadata:
      labels:
        app: ai-api-proxy
    spec:
      containers:
      - name: ai-proxy
        image: ai-proxy/api-gateway:v2.6.0
        ports:
        - containerPort: 8080
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openai-secrets
              key: apiKey
        - name: ANTHROPIC_API_KEY
          valueFrom:
            secretKeyRef:
              name: anthropic-secrets
              key: apiKey
        resources:
          requests:
            memory: "512Mi"
            cpu: "200m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

3. Service Mesh Integration

Istio VirtualService for advanced traffic management:

virtualservice.yaml

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ai-api-proxy
spec:
  hosts:
  - ai-api.example.com
  gateways:
  - public-gateway
  http:
  - name: "openai-route"
    match:
    - headers:
        x-model-type:
          exact: "openai"
    route:
    - destination:
        host: ai-api-proxy
        port:
          number: 8080
    retries:
      attempts: 3
      perTryTimeout: 2s
    timeout: 30s
    
  - name: "anthropic-route"
    match:
    - headers:
        x-model-type:
          exact: "anthropic"
    route:
    - destination:
        host: ai-api-proxy
        port:
          number: 8080
    fault:
      delay:
        percentage:
          value: 0.1
        fixedDelay: 7s
    timeout: 60s

📈

HPA Configuration

Dynamic scaling based on CPU utilization and custom metrics (requests per second).

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-api-proxy-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-api-proxy
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Pods
    pods:
      metric:
        name: requests-per-second
      target:
        type: AverageValue
        averageValue: 1000

🛡️

Security Policies

PodSecurityPolicy and NetworkPolicy for secure cluster operations.

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: ai-proxy-restricted
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'secret'
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'MustRunAs'
    ranges:
      - min: 1000
        max: 65535

Monitoring and Observability

Metric	Target	Alert Threshold	Tool
CPU Utilization	70% average	90% for 5min	Prometheus
Memory Usage	80% average	90% for 5min	Grafana
Request Rate	1000 req/sec	2000 req/sec	Prometheus
Error Rate	0.1%	1%	Jaeger
Response Time (p95)	500ms	1000ms	Kiali
Pod Restarts	0	3 in 10min	Kubernetes Events

Partner Resources

Explore related tools and services