LLM Proxy for Local Development - Developer Setup Guide

Local Development Options

Ollama

Easiest

Run LLMs locally with a single command. Perfect for quick prototyping and offline development.

One-command model download
No internet required after setup
Cross-platform support
REST API included
Apple Silicon optimized

                            # Install
curl -fsSL https://ollama.com/install.sh | sh

                            # Run a model
ollama run llama2

                            # API at http://localhost:11434

LocalAI

Full Featured

OpenAI-compatible API running entirely locally. Supports multiple model formats and GPU acceleration.

OpenAI API compatible
GPU acceleration support
Multiple model formats
Image generation
Audio transcription

                            # Docker run
docker run -p 8080:8080 \

                              -v $PWD/models:/models \

                              localai/localai:latest

                            # API at http://localhost:8080

LiteLLM Mock

Testing

Mock mode for testing without API calls. Great for CI/CD and automated testing scenarios.

Zero API cost testing
Predictable responses
CI/CD friendly
Custom mock responses
Streaming mock support

                            # Start in mock mode
litellm --model gpt-3.5-turbo \

                              --mock-response "Test response"

                            # All calls return mock data

LM Studio

GUI

Desktop application with built-in local server. Visual interface for model management.

Beautiful GUI interface
Model browser & download
Built-in chat interface
Local server mode
Easy configuration

                            # Download from lmstudio.ai
# Start local server in app

                            # API at http://localhost:1234
# OpenAI-compatible endpoints

Development Workflow

Set Up Local Proxy

Install your preferred local LLM proxy solution. Configure it to match your production API structure for seamless transition.

Configure Environment

Point your application to the local proxy endpoint. Use environment variables to easily switch between local and production.

Develop & Test

Iterate quickly without API costs or rate limits. Test edge cases, error handling, and response parsing locally.

Deploy to Production

Switch your environment variable to production endpoints. Your code works identically with real LLM APIs.

Why Use a Local Proxy?

💰

Zero API Costs

Development and testing without burning through API credits

⚡

Faster Iteration

No network latency means instant feedback on changes

🔒

Privacy

Keep sensitive data local during development

📴

Offline Development

Work anywhere, even without internet connection

🧪

Reliable Testing

Predictable, reproducible test results every time

📊

No Rate Limits

Unlimited requests for development and testing

Mock Response Examples

Simple Text Response

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "This is a mock response for testing."
    }
  }]
}

Streaming Response

data: {"choices":[{
  "delta": {"content": "Hello"}
}]}

data: {"choices":[{
  "delta": {"content": " world"}
}]}

data: [DONE]

Error Simulation

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Mock rate limit for testing"
  }
}

Option Comparison

Feature	Ollama	LocalAI	LiteLLM Mock	LM Studio
Setup Time	<5 min	10-15 min	<2 min	5-10 min
GUI	CLI only	CLI only	CLI only	✓
GPU Support	✓	✓	—	✓
OpenAI Compatible	✓	✓	✓	✓
Offline	✓	✓	✓	✓
Best For	Quick start	Production-like	CI/CD	Visual dev

🔗 Related Resources

Ollama OpenAI API Setup | LM Studio Guide | Self-Hosted Options | Free Tools

⚡ Quick Start