How to Setup LiteLLM Proxy

Installation Guide

Follow these steps to install and configure LiteLLM proxy on your system

1

Prerequisites Check

Before installing LiteLLM, ensure your system meets the following requirements. LiteLLM requires Python 3.8 or higher and works best with virtual environments. Having these prerequisites in place ensures a smooth installation process and avoids common setup issues that many developers encounter during initial configuration.

Python 3.8+ installed on your system
pip package manager (usually comes with Python)
Virtual environment tool (venv or conda recommended)
API keys for your chosen LLM providers
At least 512MB of available RAM

                                Check Python Version
                                Bash
                            
# Check Python version (requires 3.8+)
python --version

# Create virtual environment
python -m venv litellm-env

# Activate virtual environment
source litellm-env/bin/activate  # Linux/Mac
litellm-env\Scripts\activate     # Windows

2

Install LiteLLM Package

Install LiteLLM using pip, the Python package manager. The installation includes all core dependencies needed to run the proxy server. For production deployments, we recommend installing the proxy extras which include additional features like rate limiting, caching, and authentication middleware that enhance the security and performance of your proxy deployment.

                                Install LiteLLM
                                Bash
                            
# Install LiteLLM with proxy support
pip install litellm[proxy]

# Or install basic version
pip install litellm

# Verify installation
litellm --version

3

Set Environment Variables

Configure your API keys and environment variables. LiteLLM uses environment variables to authenticate with various LLM providers. Store your API keys securely and never commit them to version control. Using environment variables provides better security and flexibility when deploying across different environments like development, staging, and production.

                                Environment Variables
                                Bash
                            
# OpenAI API key
export OPENAI_API_KEY="sk-your-openai-key"

# Anthropic API key
export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key"

# Azure OpenAI configuration
export AZURE_API_KEY="your-azure-key"
export AZURE_API_BASE="https://your-resource.openai.azure.com"

# Google AI / PaLM
export PALM_API_KEY="your-palm-key"

4

Create Configuration File

Create a configuration file to define your LLM providers, models, and proxy settings. The YAML configuration file provides a centralized way to manage all your settings, making it easy to version control and deploy across different environments. LiteLLM supports extensive configuration options for load balancing, fallbacks, rate limiting, and custom routing rules.

                                litellm_config.yaml
                                YAML
                            

                                
model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4-turbo-preview
      api_key: os.environ/OPENAI_API_KEY

  - model_name: claude-3
    litellm_params:
      model: anthropic/claude-3-opus-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: gpt-35
    litellm_params:
      model: azure/gpt-35-turbo
      api_key: os.environ/AZURE_API_KEY
      api_base: os.environ/AZURE_API_BASE

general_settings:
  master_key: sk-1234  # Proxy API key
  database: os.environ/DATABASE_URL
                                
                            

5

Start the Proxy Server

Launch your LiteLLM proxy server with the configuration file. The server will start on port 4000 by default and provide an OpenAI-compatible API endpoint that your applications can use. You can customize the port, enable authentication, configure logging, and set up monitoring through additional command-line flags or configuration options.

                                Start Server
                                Bash
                            
# Start proxy with config file
litellm --config litellm_config.yaml

# Start with custom port
litellm --config litellm_config.yaml --port 8080

# Start with detailed logging
litellm --config litellm_config.yaml --detailed_debug

# Start in production mode
litellm --config litellm_config.yaml --num_workers 4

💡 Pro Tip

Your proxy is now running at http://localhost:4000. You can test it by making a request to http://localhost:4000/v1/chat/completions with your master key in the Authorization header. The proxy automatically routes requests to the appropriate LLM provider based on the model name specified in your configuration file.

Configuration Options

Explore the comprehensive configuration options available for LiteLLM proxy

Essential Configuration Parameters

Parameter	Description	Default
master_key	API key for authenticating requests to the proxy server	None
database	PostgreSQL connection URL for storing usage data and API keys	SQLite
max_parallel_requests	Maximum number of concurrent requests to handle	100
request_timeout	Timeout in seconds for upstream LLM API requests	600
fallbacks	List of fallback models if primary model fails	Empty
cache	Enable response caching for repeated queries	false
success_callback	Webhook URL for success notifications	None
failure_callback	Webhook URL for failure notifications	None

Supported LLM Providers

LiteLLM supports 100+ models from multiple providers with unified API

OpenAI

GPT-4, GPT-3.5, DALL-E, Whisper

Anthropic

Claude 3 Opus, Sonnet, Haiku

Azure OpenAI

GPT-4, GPT-3.5 (Azure)

Google AI

Gemini Pro, PaLM 2

AWS Bedrock

Claude, Llama 2, Titan

Cohere

Command, Embed, Summarize

Replicate

Llama 2, Mistral, Vicuna

HuggingFace

Open source models

Key Features

Discover the powerful features that make LiteLLM the preferred choice for AI proxy

🔄

Automatic Load Balancing

Distribute requests across multiple providers and models automatically. LiteLLM intelligently routes traffic based on availability, cost, and performance metrics to optimize your AI operations.

⚡

Response Caching

Reduce API costs by up to 70% with intelligent response caching. Cache identical or similar queries and serve instant responses without hitting upstream providers, dramatically improving latency.

🔐

Authentication & Security

Secure your proxy with API key authentication, rate limiting, and usage tracking. Set up virtual keys for different teams and control access with fine-grained permissions.

📊

Usage Analytics

Track token usage, costs, and performance metrics across all your AI applications. Generate detailed reports and set up budget alerts to control spending.

🛡️

Fallback & Retry

Ensure high availability with automatic fallback to backup models and intelligent retry logic. Handle provider outages gracefully without affecting your applications.

🔌

OpenAI-Compatible API

Use the standard OpenAI SDK and API format with any LLM provider. No code changes required when switching between models or providers in your applications.

Production Deployment

Choose the deployment method that best fits your infrastructure requirements

🐳

Docker Deployment

Deploy LiteLLM using Docker containers for easy setup, isolation, and scaling. Docker provides consistent environments across development and production, making it ideal for teams adopting containerization.

Quick setup with docker-compose
Easy scaling with multiple containers
Isolated dependencies
Portable across environments
Integration with CI/CD pipelines

                            Docker Run
                            Bash
                        

                            
docker run -d \
  -p 4000:4000 \
  -v $(pwd)/litellm_config.yaml:/app/config.yaml \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  ghcr.io/berriai/litellm:main-latest \
  --config /app/config.yaml
                            
                        

☸️

Kubernetes Deployment

Deploy LiteLLM on Kubernetes for enterprise-grade scalability, high availability, and orchestration. Perfect for organizations running production AI workloads at scale with complex routing requirements.

Auto-scaling based on load
High availability with replicas
Rolling updates and rollbacks
Service mesh integration
Advanced monitoring and logging

                            Kubernetes Apply
                            Bash
                        
# Create namespace
kubectl create namespace litellm

# Deploy with Helm
helm install litellm ./litellm-helm \
  --namespace litellm \
  --set replicaCount=3

⚠️ Production Considerations

For production deployments, always configure SSL/TLS certificates, set up proper authentication, implement rate limiting, enable logging and monitoring, use managed databases for persistence, and configure health checks and auto-restart policies. Test your deployment thoroughly before going live to ensure reliability and security.

Quick Install

Easy Config

Deploy Ready

Installation Guide

Prerequisites Check

Install LiteLLM Package

Set Environment Variables

Create Configuration File

Start the Proxy Server

Configuration Options

Essential Configuration Parameters

Supported LLM Providers

Key Features

Automatic Load Balancing

Response Caching

Authentication & Security

Usage Analytics

Fallback & Retry

OpenAI-Compatible API

Production Deployment

Docker Deployment

Kubernetes Deployment

How to Setup LiteLLM Proxy

Quick Install

Easy Config

Deploy Ready

Installation Guide

Prerequisites Check

Install LiteLLM Package

Set Environment Variables

Create Configuration File

Start the Proxy Server

Configuration Options

Essential Configuration Parameters

Supported LLM Providers

Key Features

Automatic Load Balancing

Response Caching

Authentication & Security

Usage Analytics

Fallback & Retry

OpenAI-Compatible API

Production Deployment

Docker Deployment

Kubernetes Deployment

Related Guides & Resources

LLM Observability Proxy

How to Build LLM Proxy

Deploy LLM API Gateway

OpenAI Reverse Proxy Setup