Kong Gateway LLM Proxy Plugin - Enterprise AI Integration

Plugin Capabilities

🔀

LLM Request Router

Intelligent routing of LLM requests across multiple providers based on cost, latency, availability, and custom policies.

Multi-provider load balancing
Weighted routing
Active health checks
Circuit breaker pattern

💾

Response Caching

Cache LLM responses to reduce API costs and latency. Supports multiple cache backends including Redis and PostgreSQL.

Content-based cache keys
TTL management
Cache hit/miss metrics
Cache warming strategies

🚦

Advanced Rate Limiting

Token-aware rate limiting designed specifically for LLM APIs with cost attribution and quota management.

Token-based quotas
Cost tracking per request
Burst handling
Per-consumer limits

🔄

Request Transformation

Transform requests between different LLM API formats. Convert OpenAI requests to Anthropic or other provider formats automatically.

Format conversion
Header manipulation
Request body rewriting
Response normalization

📊

Observability Plugin

Comprehensive metrics, logging, and tracing for LLM API calls. Integration with Datadog, Prometheus, and custom backends.

Token usage metrics
Latency histograms
Cost attribution
Error tracking

🔐

API Key Management

Centralized management of LLM provider API keys with rotation, encryption, and secure injection into requests.

Vault integration
Key rotation
Audited access
Per-route keys

Architecture Flow

Kong sits between your applications and LLM providers, managing all traffic through plugins.

Client Apps

SDK / API calls

→

Kong Gateway

Plugin Chain

→

OpenAI

GPT-4, etc.

Redis Cache

Response cache

PostgreSQL

Rate limit data

Prometheus

Metrics

Plugin Configuration

                        Kong Declarative Configuration (YAML)
                    

# Enable LLM Proxy Plugin
_format_version: "3.0"

services:
  - name: openai-service
    url: https://api.openai.com/v1
    routes:
      - name: chat-route
        paths:
          - /v1/chat
        plugins:
          - name: llm-router
            config:
              providers:
                - name: openai
                  weight: 80
                - name: anthropic
                  weight: 20
              cache_enabled: true
              cache_ttl: 3600
          - name: rate-limiting
            config:
              minute: 100
              policy: local
              fault_tolerant: true
                    

                        Custom Plugin (Lua)
                    

-- kong/plugins/llm-proxy/handler.lua
local LLMProxyHandler = {
  VERSION = "1.0.0",
  PRIORITY = 1000,
}

function LLMProxyHandler:access(conf)
  local request = kong.request
  
  -- Check cache first
  local cache_key = generate_cache_key(request)
  local cached = cache_get(cache_key)
  
  if cached then
    return kong.response.exit(200, cached)
  end
  
  -- Route to appropriate provider
  local provider = select_provider(conf.providers)
  local response = forward_request(provider, request)
  
  -- Cache and return
  cache_set(cache_key, response, conf.cache_ttl)
  return kong.response.exit(response.status, response.body)
end

return LLMProxyHandler
                    

Plugin Comparison

Feature	Community Plugin	Enterprise Plugin
Multi-Provider Routing	✓	✓
Response Caching	✓	✓ Advanced
Rate Limiting	✓ Basic	✓ Token-aware
Cost Tracking	—	✓
Semantic Caching	—	✓
Fine-tuning Support	—	✓
24/7 Support	—	✓

🔗 Related Resources

Kong vs LiteLLM vs APISIX | Enterprise LLM Proxy | Load Balancing | Redis Caching