Why Docker Deployment?
Understanding the benefits of containerizing your LiteLLM proxy
Docker deployment provides consistency, isolation, and portability for your LiteLLM proxy. Containers ensure your proxy runs identically across development, staging, and production environments. This eliminates environment-specific issues and simplifies the deployment process significantly. Containerization also enables easy scaling, rolling updates, and efficient resource utilization through orchestration platforms like Kubernetes or Docker Swarm.
Consistent Environments
Eliminate "works on my machine" problems. Docker ensures your LiteLLM proxy runs identically everywhere by packaging all dependencies, runtime, and configuration into a single container image.
Isolation & Security
Containers provide process isolation, limiting the blast radius of potential security issues. Each LiteLLM instance runs in its own isolated environment with controlled resource access.
Fast Deployment
Deploy new instances in seconds rather than minutes. Docker images start quickly, enabling rapid scaling in response to traffic changes and fast recovery from failures.
Easy Updates
Perform rolling updates with zero downtime. Pull new images and restart containers without affecting running traffic, ensuring continuous availability of your AI services.
Resource Efficiency
Containers share the host OS kernel, making them lightweight compared to VMs. Run more LiteLLM instances on the same hardware, optimizing resource utilization and reducing costs.
Simplified CI/CD
Integrate Docker builds into your CI/CD pipeline for automated testing and deployment. Every code change can trigger container builds and deployments automatically.
Quick Start
Get LiteLLM running in Docker in under 5 minutes
Option 1: Use Official Image
# Pull official LiteLLM image docker pull ghcr.io/berriai/litellm:main-latest # Run with environment variables docker run -d \ --name litellm-proxy \ -p 4000:4000 \ -e OPENAI_API_KEY=sk-your-key \ -e ANTHROPIC_API_KEY=sk-ant-your-key \ -e LITELLM_MASTER_KEY=sk-master-key \ ghcr.io/berriai/litellm:main-latest # Test the proxy curl http://localhost:4000/health
Option 2: Custom Dockerfile
# Use Python slim image for smaller size FROM python:3.11-slim # Set working directory WORKDIR /app # Install system dependencies RUN apt-get update && apt-get install -y \ curl \ && rm -rf /var/lib/apt/lists/* # Install LiteLLM with proxy extras RUN pip install --no-cache-dir litellm[proxy] # Copy configuration file COPY litellm_config.yaml /app/config.yaml # Create non-root user RUN useradd -m -u 1000 litellm && \ chown -R litellm:litellm /app USER litellm # Expose port EXPOSE 4000 # Health check HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \ CMD curl -f http://localhost:4000/health || exit 1 # Run LiteLLM proxy CMD ["litellm", "--config", "/app/config.yaml", "--port", "4000"]
Using python:3.11-slim instead of full python image reduces container size from ~1GB to ~200MB. For even smaller images, consider using distroless or alpine-based images, though they require additional configuration for some LiteLLM dependencies.
Docker Compose Setup
Orchestrate LiteLLM with databases, caches, and monitoring
version: '3.8' services: litellm: build: . container_name: litellm-proxy ports: - "4000:4000" environment: - DATABASE_URL=postgresql://litellm:password@postgres:5432/litellm - REDIS_URL=redis://redis:6379 - LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY} - OPENAI_API_KEY=${OPENAI_API_KEY} - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY} volumes: - ./litellm_config.yaml:/app/config.yaml:ro depends_on: - postgres - redis restart: unless-stopped networks: - litellm-network postgres: image: postgres:15-alpine container_name: litellm-postgres environment: POSTGRES_DB: litellm POSTGRES_USER: litellm POSTGRES_PASSWORD: password volumes: - postgres_data:/var/lib/postgresql/data networks: - litellm-network redis: image: redis:7-alpine container_name: litellm-redis volumes: - redis_data:/data networks: - litellm-network volumes: postgres_data: redis_data: networks: litellm-network: driver: bridge
Run with Docker Compose
# Create .env file with secrets cat > .env << EOF LITELLM_MASTER_KEY=sk-your-master-key OPENAI_API_KEY=sk-your-openai-key ANTHROPIC_API_KEY=sk-ant-your-key EOF # Start all services docker-compose up -d # View logs docker-compose logs -f litellm # Scale horizontally docker-compose up -d --scale litellm=3 # Stop services docker-compose down
Kubernetes Deployment
Deploy LiteLLM to Kubernetes for production scalability
Create Namespace & Secrets
Set up a dedicated namespace and store API keys securely using Kubernetes Secrets.
- kubectl create namespace litellm
- kubectl create secret generic litellm-secrets
- Store API keys and master key
- Use sealed-secrets for GitOps
Deploy with ConfigMap
Create ConfigMap for LiteLLM configuration and Deployments for running instances.
- ConfigMap for config.yaml
- Deployment with replicas
- Resource limits and requests
- Liveness and readiness probes
Configure Services
Create Service for internal communication and Ingress for external access.
- ClusterIP Service
- Ingress with TLS
- LoadBalancer for cloud
- Horizontal Pod Autoscaler
Set Up Monitoring
Deploy Prometheus and Grafana for comprehensive monitoring and alerting.
- Prometheus metrics endpoint
- Grafana dashboards
- AlertManager rules
- Log aggregation with Loki
apiVersion: apps/v1 kind: Deployment metadata: name: litellm-proxy namespace: litellm spec: replicas: 3 selector: matchLabels: app: litellm-proxy template: metadata: labels: app: litellm-proxy spec: containers: - name: litellm image: ghcr.io/berriai/litellm:main-latest ports: - containerPort: 4000 envFrom: - secretRef: name: litellm-secrets resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "2Gi" cpu: "1000m" livenessProbe: httpGet: path: /health port: 4000 initialDelaySeconds: 30 periodSeconds: 10
Production Best Practices
Ensure reliability, security, and performance in production
Production Checklist
| Area | Requirement | Priority |
|---|---|---|
| Security | Use secrets management, not env vars in code | Critical |
| High Availability | Run minimum 3 replicas across availability zones | Critical |
| Monitoring | Implement metrics, logging, and alerting | Critical |
| Backups | Regular database backups with tested restore | High |
| Resource Limits | Set appropriate CPU and memory limits | High |
| SSL/TLS | Enable HTTPS with valid certificates | Critical |
| Rate Limiting | Implement to protect against quota exhaustion | High |
| Disaster Recovery | Document and test recovery procedures | High |
Never commit API keys or secrets to version control. Use Kubernetes Secrets, Docker secrets, or external secret management tools like HashiCorp Vault. Rotate credentials regularly and implement proper access controls for production environments.