Plugin Capabilities

🔀

LLM Request Router

Intelligent routing of LLM requests across multiple providers based on cost, latency, availability, and custom policies.

  • Multi-provider load balancing
  • Weighted routing
  • Active health checks
  • Circuit breaker pattern
💾

Response Caching

Cache LLM responses to reduce API costs and latency. Supports multiple cache backends including Redis and PostgreSQL.

  • Content-based cache keys
  • TTL management
  • Cache hit/miss metrics
  • Cache warming strategies
🚦

Advanced Rate Limiting

Token-aware rate limiting designed specifically for LLM APIs with cost attribution and quota management.

  • Token-based quotas
  • Cost tracking per request
  • Burst handling
  • Per-consumer limits
🔄

Request Transformation

Transform requests between different LLM API formats. Convert OpenAI requests to Anthropic or other provider formats automatically.

  • Format conversion
  • Header manipulation
  • Request body rewriting
  • Response normalization
📊

Observability Plugin

Comprehensive metrics, logging, and tracing for LLM API calls. Integration with Datadog, Prometheus, and custom backends.

  • Token usage metrics
  • Latency histograms
  • Cost attribution
  • Error tracking
🔐

API Key Management

Centralized management of LLM provider API keys with rotation, encryption, and secure injection into requests.

  • Vault integration
  • Key rotation
  • Audited access
  • Per-route keys

Architecture Flow

Kong sits between your applications and LLM providers, managing all traffic through plugins.

Client Apps

SDK / API calls

Kong Gateway

Plugin Chain

OpenAI

GPT-4, etc.

Redis Cache

Response cache

PostgreSQL

Rate limit data

Prometheus

Metrics

Plugin Configuration

Kong Declarative Configuration (YAML)
# Enable LLM Proxy Plugin _format_version: "3.0" services: - name: openai-service url: https://api.openai.com/v1 routes: - name: chat-route paths: - /v1/chat plugins: - name: llm-router config: providers: - name: openai weight: 80 - name: anthropic weight: 20 cache_enabled: true cache_ttl: 3600 - name: rate-limiting config: minute: 100 policy: local fault_tolerant: true
Custom Plugin (Lua)
-- kong/plugins/llm-proxy/handler.lua local LLMProxyHandler = { VERSION = "1.0.0", PRIORITY = 1000, } function LLMProxyHandler:access(conf) local request = kong.request -- Check cache first local cache_key = generate_cache_key(request) local cached = cache_get(cache_key) if cached then return kong.response.exit(200, cached) end -- Route to appropriate provider local provider = select_provider(conf.providers) local response = forward_request(provider, request) -- Cache and return cache_set(cache_key, response, conf.cache_ttl) return kong.response.exit(response.status, response.body) end return LLMProxyHandler

Plugin Comparison

Feature Community Plugin Enterprise Plugin
Multi-Provider Routing
Response Caching Advanced
Rate Limiting Basic Token-aware
Cost Tracking
Semantic Caching
Fine-tuning Support
24/7 Support