Production Architecture
A production-grade AI API gateway deployment requires multiple components working together to ensure reliability, scalability, and security.
99.99%
Uptime SLA
<100ms
P99 Latency
10K+
Req/Second
3+
AZ Replication
Kubernetes Deployment
Deployment Manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-gateway
labels:
app: api-gateway
spec:
replicas: 3
selector:
matchLabels:
app: api-gateway
template:
metadata:
labels:
app: api-gateway
spec:
containers:
- name: gateway
image: api-gateway:v2.1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "2000m"
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
readinessProbe:
httpGet:
path: /health/ready
port: 8080
env:
- name: GATEWAY_MODE
value: "production"
- name: LOG_LEVEL
value: "info"
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-gateway-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-gateway
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Monitoring & Observability
Essential Metrics
- Request rate and latency percentiles (P50, P95, P99)
- Error rates by endpoint and provider
- Upstream API response times
- Cache hit/miss ratios
- Rate limiting rejections
- Active connections and connection pool usage
- Memory and CPU utilization
- Custom business metrics (tokens processed, cost per request)
Prometheus Configuration
scrape_configs:
- job_name: 'api-gateway'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: api-gateway
action: keep
metrics_path: /metrics
scrape_interval: 15s
Alerting Rules
groups:
- name: api-gateway
rules:
- alert: HighErrorRate
expr: rate(gateway_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: High error rate detected
- alert: HighLatency
expr: histogram_quantile(0.99, gateway_request_duration_seconds) > 1
for: 10m
labels:
severity: warning
Security Hardening
Production Security Checklist
- Enable TLS 1.2+ for all external communication
- Implement mTLS for service-to-service communication
- Use secrets management (Vault, AWS Secrets Manager)
- Enable network policies in Kubernetes
- Configure rate limiting per client and globally
- Enable audit logging for compliance
- Regular security scanning of container images
- Implement IP allowlisting for admin endpoints
Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-gateway-policy
spec:
podSelector:
matchLabels:
app: api-gateway
ingress:
- from:
- namespaceSelector:
matchLabels:
name: istio-system
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 443
Never Hardcode Secrets
Use Kubernetes Secrets or external secret managers. Rotate API keys regularly and implement automated secret rotation workflows.
Disaster Recovery
Backup Strategy
- Configuration: Store all configs in Git with version control
- Secrets: Backup encrypted secrets to secure storage
- Metrics: Long-term metrics storage in managed Prometheus or similar
- Logs: Centralized log aggregation (ELK, Loki)
Failover Configuration
# Multi-region deployment
regions:
primary:
region: us-east-1
replicas: 3
priority: 100
secondary:
region: us-west-2
replicas: 2
priority: 50
failover:
enabled: true
health_check_interval: 10s
auto_failover: true
manual_approval: false