LLM Proxy for Local Development

Set up a local LLM proxy for faster iteration, reduced costs, and offline development. Perfect for prototyping, testing, and building AI-powered applications.

~/dev/llm-proxy

โšก Quick Start

# Install and run a local LLM proxy in under 60 seconds

$ pip install litellm
$ litellm --model gpt-3.5-turbo

# Your proxy is now running at http://localhost:4000
# All OpenAI-compatible endpoints available

Local Development Options

Ollama

Easiest

Run LLMs locally with a single command. Perfect for quick prototyping and offline development.

  • One-command model download
  • No internet required after setup
  • Cross-platform support
  • REST API included
  • Apple Silicon optimized
# Install
curl -fsSL https://ollama.com/install.sh | sh

# Run a model
ollama run llama2

# API at http://localhost:11434

LocalAI

Full Featured

OpenAI-compatible API running entirely locally. Supports multiple model formats and GPU acceleration.

  • OpenAI API compatible
  • GPU acceleration support
  • Multiple model formats
  • Image generation
  • Audio transcription
# Docker run
docker run -p 8080:8080 \
-v $PWD/models:/models \
localai/localai:latest

# API at http://localhost:8080

LiteLLM Mock

Testing

Mock mode for testing without API calls. Great for CI/CD and automated testing scenarios.

  • Zero API cost testing
  • Predictable responses
  • CI/CD friendly
  • Custom mock responses
  • Streaming mock support
# Start in mock mode
litellm --model gpt-3.5-turbo \
--mock-response "Test response"

# All calls return mock data

LM Studio

GUI

Desktop application with built-in local server. Visual interface for model management.

  • Beautiful GUI interface
  • Model browser & download
  • Built-in chat interface
  • Local server mode
  • Easy configuration
# Download from lmstudio.ai
# Start local server in app

# API at http://localhost:1234
# OpenAI-compatible endpoints

Development Workflow

1

Set Up Local Proxy

Install your preferred local LLM proxy solution. Configure it to match your production API structure for seamless transition.

2

Configure Environment

Point your application to the local proxy endpoint. Use environment variables to easily switch between local and production.

3

Develop & Test

Iterate quickly without API costs or rate limits. Test edge cases, error handling, and response parsing locally.

4

Deploy to Production

Switch your environment variable to production endpoints. Your code works identically with real LLM APIs.

Why Use a Local Proxy?

๐Ÿ’ฐ

Zero API Costs

Development and testing without burning through API credits

โšก

Faster Iteration

No network latency means instant feedback on changes

๐Ÿ”’

Privacy

Keep sensitive data local during development

๐Ÿ“ด

Offline Development

Work anywhere, even without internet connection

๐Ÿงช

Reliable Testing

Predictable, reproducible test results every time

๐Ÿ“Š

No Rate Limits

Unlimited requests for development and testing

Mock Response Examples

Simple Text Response
{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "This is a mock response for testing."
    }
  }]
}
Streaming Response
data: {"choices":[{
  "delta": {"content": "Hello"}
}]}

data: {"choices":[{
  "delta": {"content": " world"}
}]}

data: [DONE]
Error Simulation
{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Mock rate limit for testing"
  }
}

Option Comparison

Feature Ollama LocalAI LiteLLM Mock LM Studio
Setup Time <5 min 10-15 min <2 min 5-10 min
GUI CLI only CLI only CLI only โœ“
GPU Support โœ“ โœ“ โ€” โœ“
OpenAI Compatible โœ“ โœ“ โœ“ โœ“
Offline โœ“ โœ“ โœ“ โœ“
Best For Quick start Production-like CI/CD Visual dev