πŸ“š Complete Setup Tutorial

How to Setup LiteLLM Proxy

Master the complete installation and configuration of LiteLLM proxy server. Learn to set up multi-provider support, load balancing, caching strategies, authentication, and production deployment for your AI applications with this comprehensive step-by-step guide.

πŸ“…
Updated March 2024
⏱️
20 min read
πŸ“Š
Difficulty: Beginner
⚑

Quick Install

Get LiteLLM running in under 5 minutes with pip installation

πŸ”§

Easy Config

Simple YAML configuration for multiple AI providers

πŸš€

Deploy Ready

Production deployment with Docker and Kubernetes support

Installation Guide

Follow these steps to install and configure LiteLLM proxy on your system

1

Prerequisites Check

Before installing LiteLLM, ensure your system meets the following requirements. LiteLLM requires Python 3.8 or higher and works best with virtual environments. Having these prerequisites in place ensures a smooth installation process and avoids common setup issues that many developers encounter during initial configuration.

  • Python 3.8+ installed on your system
  • pip package manager (usually comes with Python)
  • Virtual environment tool (venv or conda recommended)
  • API keys for your chosen LLM providers
  • At least 512MB of available RAM
Check Python Version Bash
# Check Python version (requires 3.8+) python --version # Create virtual environment python -m venv litellm-env # Activate virtual environment source litellm-env/bin/activate # Linux/Mac litellm-env\Scripts\activate # Windows
2

Install LiteLLM Package

Install LiteLLM using pip, the Python package manager. The installation includes all core dependencies needed to run the proxy server. For production deployments, we recommend installing the proxy extras which include additional features like rate limiting, caching, and authentication middleware that enhance the security and performance of your proxy deployment.

Install LiteLLM Bash
# Install LiteLLM with proxy support pip install litellm[proxy] # Or install basic version pip install litellm # Verify installation litellm --version
3

Set Environment Variables

Configure your API keys and environment variables. LiteLLM uses environment variables to authenticate with various LLM providers. Store your API keys securely and never commit them to version control. Using environment variables provides better security and flexibility when deploying across different environments like development, staging, and production.

Environment Variables Bash
# OpenAI API key export OPENAI_API_KEY="sk-your-openai-key" # Anthropic API key export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key" # Azure OpenAI configuration export AZURE_API_KEY="your-azure-key" export AZURE_API_BASE="https://your-resource.openai.azure.com" # Google AI / PaLM export PALM_API_KEY="your-palm-key"
4

Create Configuration File

Create a configuration file to define your LLM providers, models, and proxy settings. The YAML configuration file provides a centralized way to manage all your settings, making it easy to version control and deploy across different environments. LiteLLM supports extensive configuration options for load balancing, fallbacks, rate limiting, and custom routing rules.

litellm_config.yaml YAML
model_list: - model_name: gpt-4 litellm_params: model: openai/gpt-4-turbo-preview api_key: os.environ/OPENAI_API_KEY - model_name: claude-3 litellm_params: model: anthropic/claude-3-opus-20240229 api_key: os.environ/ANTHROPIC_API_KEY - model_name: gpt-35 litellm_params: model: azure/gpt-35-turbo api_key: os.environ/AZURE_API_KEY api_base: os.environ/AZURE_API_BASE general_settings: master_key: sk-1234 # Proxy API key database: os.environ/DATABASE_URL
5

Start the Proxy Server

Launch your LiteLLM proxy server with the configuration file. The server will start on port 4000 by default and provide an OpenAI-compatible API endpoint that your applications can use. You can customize the port, enable authentication, configure logging, and set up monitoring through additional command-line flags or configuration options.

Start Server Bash
# Start proxy with config file litellm --config litellm_config.yaml # Start with custom port litellm --config litellm_config.yaml --port 8080 # Start with detailed logging litellm --config litellm_config.yaml --detailed_debug # Start in production mode litellm --config litellm_config.yaml --num_workers 4
πŸ’‘ Pro Tip

Your proxy is now running at http://localhost:4000. You can test it by making a request to http://localhost:4000/v1/chat/completions with your master key in the Authorization header. The proxy automatically routes requests to the appropriate LLM provider based on the model name specified in your configuration file.

Configuration Options

Explore the comprehensive configuration options available for LiteLLM proxy

Essential Configuration Parameters

Parameter Description Default
master_key API key for authenticating requests to the proxy server None
database PostgreSQL connection URL for storing usage data and API keys SQLite
max_parallel_requests Maximum number of concurrent requests to handle 100
request_timeout Timeout in seconds for upstream LLM API requests 600
fallbacks List of fallback models if primary model fails Empty
cache Enable response caching for repeated queries false
success_callback Webhook URL for success notifications None
failure_callback Webhook URL for failure notifications None

Supported LLM Providers

LiteLLM supports 100+ models from multiple providers with unified API

OpenAI
GPT-4, GPT-3.5, DALL-E, Whisper
Anthropic
Claude 3 Opus, Sonnet, Haiku
Azure OpenAI
GPT-4, GPT-3.5 (Azure)
Google AI
Gemini Pro, PaLM 2
AWS Bedrock
Claude, Llama 2, Titan
Cohere
Command, Embed, Summarize
Replicate
Llama 2, Mistral, Vicuna
HuggingFace
Open source models

Key Features

Discover the powerful features that make LiteLLM the preferred choice for AI proxy

πŸ”„

Automatic Load Balancing

Distribute requests across multiple providers and models automatically. LiteLLM intelligently routes traffic based on availability, cost, and performance metrics to optimize your AI operations.

⚑

Response Caching

Reduce API costs by up to 70% with intelligent response caching. Cache identical or similar queries and serve instant responses without hitting upstream providers, dramatically improving latency.

πŸ”

Authentication & Security

Secure your proxy with API key authentication, rate limiting, and usage tracking. Set up virtual keys for different teams and control access with fine-grained permissions.

πŸ“Š

Usage Analytics

Track token usage, costs, and performance metrics across all your AI applications. Generate detailed reports and set up budget alerts to control spending.

πŸ›‘οΈ

Fallback & Retry

Ensure high availability with automatic fallback to backup models and intelligent retry logic. Handle provider outages gracefully without affecting your applications.

πŸ”Œ

OpenAI-Compatible API

Use the standard OpenAI SDK and API format with any LLM provider. No code changes required when switching between models or providers in your applications.

Production Deployment

Choose the deployment method that best fits your infrastructure requirements

🐳

Docker Deployment

Deploy LiteLLM using Docker containers for easy setup, isolation, and scaling. Docker provides consistent environments across development and production, making it ideal for teams adopting containerization.

  • Quick setup with docker-compose
  • Easy scaling with multiple containers
  • Isolated dependencies
  • Portable across environments
  • Integration with CI/CD pipelines
Docker Run Bash
docker run -d \ -p 4000:4000 \ -v $(pwd)/litellm_config.yaml:/app/config.yaml \ -e OPENAI_API_KEY=$OPENAI_API_KEY \ ghcr.io/berriai/litellm:main-latest \ --config /app/config.yaml
☸️

Kubernetes Deployment

Deploy LiteLLM on Kubernetes for enterprise-grade scalability, high availability, and orchestration. Perfect for organizations running production AI workloads at scale with complex routing requirements.

  • Auto-scaling based on load
  • High availability with replicas
  • Rolling updates and rollbacks
  • Service mesh integration
  • Advanced monitoring and logging
Kubernetes Apply Bash
# Create namespace kubectl create namespace litellm # Deploy with Helm helm install litellm ./litellm-helm \ --namespace litellm \ --set replicaCount=3
⚠️ Production Considerations

For production deployments, always configure SSL/TLS certificates, set up proper authentication, implement rate limiting, enable logging and monitoring, use managed databases for persistence, and configure health checks and auto-restart policies. Test your deployment thoroughly before going live to ensure reliability and security.