AI API Proxy Kubernetes Deployment Guide
Complete production-ready guide to deploying AI API proxies on Kubernetes. Includes Helm charts, HPA autoscaling, service mesh integration, and 2026 best practices for high-availability enterprise deployments.
Production Kubernetes Architecture
A well-architected Kubernetes deployment ensures scalability, reliability, and maintainability for your AI API proxy infrastructure.
Ingress Controller
NGINX or Traefik routing traffic to API services with SSL termination and rate limiting.
Service Mesh (Istio/Linkerd)
Advanced traffic management, observability, and security policies across all services.
Horizontal Pod Autoscaler
Dynamic scaling based on CPU, memory, or custom metrics to handle varying API loads.
StatefulSets + Persistent Storage
Reliable state management for API rate limiting, caching, and session data.
Step-by-Step Production Deployment
1. Helm Chart Configuration
Create a production-ready Helm chart for your AI API proxy:
# AI API Proxy Helm Values
replicaCount: 3
image:
repository: ai-proxy/api-gateway
tag: v2.6.0
pullPolicy: IfNotPresent
resources:
limits:
cpu: "1000m"
memory: "2Gi"
requests:
cpu: "200m"
memory: "512Mi"
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilization: 70
targetMemoryUtilization: 80
service:
type: ClusterIP
port: 8080
ingress:
enabled: true
className: "nginx"
hosts:
- host: api.example.com
paths:
- path: /
pathType: Prefix
config:
openai:
apiKeySecret: openai-api-key
rateLimit: 5000
anthropic:
apiKeySecret: anthropic-api-key
rateLimit: 3000
2. Kubernetes Manifests
Deployment manifest with liveness and readiness probes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ai-api-proxy
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: ai-api-proxy
template:
metadata:
labels:
app: ai-api-proxy
spec:
containers:
- name: ai-proxy
image: ai-proxy/api-gateway:v2.6.0
ports:
- containerPort: 8080
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-secrets
key: apiKey
- name: ANTHROPIC_API_KEY
valueFrom:
secretKeyRef:
name: anthropic-secrets
key: apiKey
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
3. Service Mesh Integration
Istio VirtualService for advanced traffic management:
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ai-api-proxy
spec:
hosts:
- ai-api.example.com
gateways:
- public-gateway
http:
- name: "openai-route"
match:
- headers:
x-model-type:
exact: "openai"
route:
- destination:
host: ai-api-proxy
port:
number: 8080
retries:
attempts: 3
perTryTimeout: 2s
timeout: 30s
- name: "anthropic-route"
match:
- headers:
x-model-type:
exact: "anthropic"
route:
- destination:
host: ai-api-proxy
port:
number: 8080
fault:
delay:
percentage:
value: 0.1
fixedDelay: 7s
timeout: 60s
HPA Configuration
Dynamic scaling based on CPU utilization and custom metrics (requests per second).
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: ai-api-proxy-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ai-api-proxy
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: requests-per-second
target:
type: AverageValue
averageValue: 1000
Security Policies
PodSecurityPolicy and NetworkPolicy for secure cluster operations.
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: ai-proxy-restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'secret'
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'MustRunAs'
ranges:
- min: 1000
max: 65535
Monitoring and Observability
| Metric | Target | Alert Threshold | Tool |
|---|---|---|---|
| CPU Utilization | 70% average | 90% for 5min | Prometheus |
| Memory Usage | 80% average | 90% for 5min | Grafana |
| Request Rate | 1000 req/sec | 2000 req/sec | Prometheus |
| Error Rate | 0.1% | 1% | Jaeger |
| Response Time (p95) | 500ms | 1000ms | Kiali |
| Pod Restarts | 0 | 3 in 10min | Kubernetes Events |
Partner Resources
Explore related tools and services
Ai Api Gateway Self-Hosted
Professional ai api gateway self-hosted solution with adv...
Api Gateway Proxy Docker
Professional api gateway proxy docker solution with advan...
Openai Api Gateway Cloudflare
Professional openai api gateway cloudflare solution with ...
Ai Gateway Python
Professional ai gateway python solution with advanced fea...