● v2.4.0

LLM 代理服务器

企业级LLM代理服务，支持多模型路由、负载均衡、缓存优化、认证授权等完整功能，助您构建稳定高效的本地AI服务架构。

✓ 多模型支持 ✓ 负载均衡 ✓ 智能路由

server.sh

$ ./llm-proxy start

LLM Proxy Server v2.4.0 starting...
[✓] Loading configuration from config.yaml
[✓] Initializing connection pool (max: 100)
[✓] Registering models: gpt-4, claude-3, gemini-pro
[✓] Setting up load balancer (round-robin)
[✓] Enabling response cache (Redis)
[✓] Starting metrics collector
[✓] Server listening on :8080
[✓] Health check endpoint: /health
[✓] All systems operational

$ curl -X POST http://localhost:8080/v1/chat/completions

{
"id": "chatcmpl-8a7b6c5d4e3f",
"model": "gpt-4-turbo",
"usage": {
  "prompt_tokens": 1250,
  "completion_tokens": 380,
  "total_tokens": 1630
},
"latency": "142ms"
}

# 系统架构

☁️

客户端

Web/App/API

→

🚀

API Gateway

认证、限流、日志

↓

⚖️

负载均衡

轮询/最少连接

📡

模型路由器

智能分发

💾

缓存层

Redis/Memcached

📊

监控

Prometheus

🔒

安全

TLS/mTLS

↓

🤖 GPT-4

OpenAI

🧠 Claude-3

Anthropic

✨ Gemini Pro

Google

# 核心功能

⚖️

智能负载均衡

支持多种负载均衡策略，根据模型响应时间、错误率、并发数自动调整流量分配

# 配置负载均衡策略
load_balancer:
  strategy: weighted_response_time
  models:
    gpt-4: 0.5
    claude-3: 0.3
    gemini-pro: 0.2

💾

智能缓存

基于语义相似度的缓存机制，大幅降低重复请求延迟和成本

# 缓存配置
cache:
  enabled: true
  backend: redis
  ttl: 3600
  similarity_threshold: 0.95

🔐

企业级认证

支持API Key、OAuth 2.0、JWT等多种认证方式，精细化权限控制

# 认证配置
auth:
  providers:
    - api_key
    - oauth2
    - jwt
  rate_limit:1000 req/min

📊

实时监控

完整的请求日志、性能指标、错误追踪，支持Prometheus和Grafana集成

# 监控端点
metrics:
  enabled: true
  port: 9090
  exporters:
    - prometheus
    - datadog

# 快速部署

docker-compose.yml

version: '3.8'

services:
  llm-proxy:
    image: llmproxy/server:v2.4.0
    ports:
      - "8080:8080"
    volumes:
      - ./config.yaml:/app/config.yaml
    environment:
      - LOG_LEVEL=info
      - REDIS_URL=redis://cache:6379

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

1. 克隆仓库 git clone https://github.com/llmproxy/server

2. 配置模型 cp config.example.yaml config.yaml

3. 设置API Key export OPENAI_API_KEY=sk-...

4. 启动服务 docker-compose up -d

5. 验证部署 curl localhost:8080/health

# 性能指标

50K+

并发请求/秒

<10ms

网关延迟

99.99%

可用性SLA

负载测试结果

# wrk -t4 -c1000 -d60s http://localhost:8080/v1/chat/completions

Running 1m test @ http://localhost:8080/v1/chat/completions
4 threads and 1000 connections
  Thread Stats   Avg   Stdev   Max   Latency
  Req/Sec     12500.45 245.32 18920.00   15.23ms
  Latency      8.45ms   2.12ms   45.67ms   98.99%

750245 requests in 60.00s
12504.08 requests/second
0.00% requests failed