LLM Proxy for Claude Code

Claude Models Overview

Understanding the Claude 3 model family and their capabilities

Anthropic's Claude 3 model family represents a significant advancement in AI capabilities, offering three distinct models optimized for different use cases. Claude excels at complex reasoning, nuanced understanding, and maintaining safety while being genuinely helpful. When configuring your LLM proxy for Claude, understanding the differences between models helps you route requests appropriately and optimize for both performance and cost.

Claude 3 Opus

Most powerful model for complex tasks requiring deep analysis and sophisticated reasoning.

Best for complex analysis
200K token context window
Excellent at nuanced tasks
Highest capability tier
Ideal for research and strategy

Claude 3.5 Sonnet

Balanced model offering excellent performance and speed for most production workloads.

Best balance of speed and intelligence
200K token context window
Great for coding tasks
Cost-effective for scale
Recommended for most use cases

Claude 3 Haiku

Fastest and most cost-effective model for simple tasks and high-volume applications.

Fastest response times
200K token context window
Most cost-effective
Great for simple tasks
Ideal for high-volume apps

💡 Model Selection Tip

Start with Claude 3.5 Sonnet for most applications - it offers the best balance of capability, speed, and cost. Use Opus for complex reasoning tasks where quality matters more than cost. Use Haiku for high-volume, simple queries where speed and cost are priorities. Your proxy can automatically route to the appropriate model based on request complexity.

Proxy Configuration

Setting up your LLM proxy to work with Claude API

1

Obtain Anthropic API Key

Get your API key from the Anthropic console. The key starts with "sk-ant-" and authenticates all your Claude API requests. Store this securely using environment variables or a secrets management system.

Visit console.anthropic.com
Navigate to API Keys section
Create new API key with appropriate permissions
Store securely (never commit to version control)

2

Configure Proxy for Anthropic

Set up your proxy configuration to forward requests to Anthropic's API endpoint. The base URL for Anthropic API is https://api.anthropic.com/v1. Configure proper headers including the API key and required version header.

Base URL: api.anthropic.com/v1
Header: x-api-key for authentication
Header: anthropic-version required
Content-Type: application/json

3

Implement Message Format

Claude uses a specific message format different from OpenAI. Ensure your proxy transforms requests appropriately or configure clients to use Claude's native format. The messages API accepts system prompts and conversation history.

System prompt as separate parameter
Messages array with role and content
Support for images in content blocks
Tool use and function calling support

Basic Configuration Example

                    claude-proxy-config.yaml
                    YAML
                

                    model_list:
  - model_name: claude-3-opus
    litellm_params:
      model: anthropic/claude-3-opus-20240229
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: claude-3-sonnet
    litellm_params:
      model: anthropic/claude-3-5-sonnet-20241022
      api_key: os.environ/ANTHROPIC_API_KEY

  - model_name: claude-3-haiku
    litellm_params:
      model: anthropic/claude-3-haiku-20240307
      api_key: os.environ/ANTHROPIC_API_KEY

general_settings:
  master_key: sk-your-proxy-master-key
  success_callback: ["prometheus"]
  failure_callback: ["slack"]
                    
                

Python Client Example

                    claude_client.py
                    Python
                

                    import anthropic
import os

# Initialize client pointing to your proxy
client = anthropic.Anthropic(
    api_key=os.environ["PROXY_API_KEY"],
    base_url="https://your-proxy.com/v1"
)

# Create a message
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a helpful coding assistant.",
    messages=[
        {
            "role": "user",
            "content": "Write a Python function to merge two sorted lists."
        }
    ]
)

print(message.content[0].text)
                    
                

Key Features

Leverage Claude's unique capabilities through your proxy

Feature Comparison

Feature	Claude 3 Opus	Claude 3.5 Sonnet	Claude 3 Haiku
Context Window	200K tokens	200K tokens	200K tokens
Vision Support	✓	✓	✓
Tool Use	✓	✓	✓
Prompt Caching	✓	✓	✓
Streaming	✓	✓	✓
Best For	Complex reasoning	Balanced tasks	Speed-critical

Prompt Caching

                    prompt_caching.py
                    Python
                

                    # Enable prompt caching for reduced latency and costs
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "Large system prompt here...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "Your question here"
        }
    ]
)

# Cache hit reduces cost by ~90% and latency significantly
print(message.usage)  # Shows cache read/write tokens
                    
                

Best Practices

Optimize your Claude integration for production

📝

Use System Prompts Effectively

Claude responds well to clear, detailed system prompts. Define the persona, task, and constraints explicitly. Use the separate system parameter rather than including it in messages for better behavior. Cache large system prompts for cost savings.

⚡

Implement Streaming for Better UX

Enable streaming for long responses to improve perceived performance. Claude's streaming implementation is robust and provides token-by-token output. Your proxy should pass through streaming responses without buffering.

🔄

Handle Rate Limits Gracefully

Anthropic has rate limits based on your tier. Implement exponential backoff and retry logic in your proxy. Monitor rate limit headers in responses and adjust request patterns accordingly to avoid 429 errors.

🎯

Choose the Right Model

Don't default to Opus for everything. Sonnet handles most tasks excellently at lower cost. Haiku is perfect for classification, summarization, and simple Q&A. Route requests intelligently based on complexity.

Advanced Reasoning

200K Context

Vision Support

Prompt Caching

Claude Models Overview

Proxy Configuration

Obtain Anthropic API Key

Configure Proxy for Anthropic

Implement Message Format

Basic Configuration Example

Python Client Example

Key Features

Feature Comparison

Prompt Caching

Best Practices

Use System Prompts Effectively

Implement Streaming for Better UX

Handle Rate Limits Gracefully

Choose the Right Model

Advanced Reasoning

200K Context

Vision Support

Prompt Caching

Claude Models Overview

Proxy Configuration

Obtain Anthropic API Key

Configure Proxy for Anthropic

Implement Message Format

Basic Configuration Example

Python Client Example

Key Features

Feature Comparison

Prompt Caching

Best Practices

Use System Prompts Effectively

Implement Streaming for Better UX

Handle Rate Limits Gracefully

Choose the Right Model

Related Integration Guides

LLM Proxy for GitHub Copilot

LiteLLM Proxy Setup

Docker Deployment

Rate Limiting Setup