Architecture Overview
LiteLLM Architecture
- Python-based FastAPI application
- Native OpenAI-compatible API server
- Built-in translation layer for 100+ providers
- PostgreSQL/SQLite for persistence
- Redis for caching and rate limiting
- Async/await for concurrent requests
- Docker-first deployment model
- Lightweight ~50MB container image
- No external dependencies for basic setup
- Extensible via Python plugins
Kong Architecture
- OpenResty/Nginx-based proxy
- Plugin-driven architecture
- Database-backed (PostgreSQL/Cassandra)
- Declarative or database configuration
- Built-in service mesh capabilities
- Multi-datacenter support
- Enterprise management plane
- Heavy container ~300MB image
- Requires database for full features
- Lua plugin development
APISIX Architecture
- OpenResty/Nginx-based proxy
- etcd for configuration storage
- Plugin ecosystem in Lua
- Hot-reload configuration
- Service discovery integration
- Native Kubernetes ingress
- APISIX Dashboard for management
- Lightweight ~100MB image
- No database required (etcd only)
- Lua/Go plugin development
Feature Comparison Matrix
Side-by-side comparison of capabilities for AI gateway use cases
| Feature | LiteLLM | Kong | APISIX |
|---|---|---|---|
| LLM Provider Support | 100+ (Native) | Plugin Required | Plugin Required |
| OpenAI API Compatible | ✓ Native | Via Plugin | Via Plugin |
| Token/Cost Tracking | ✓ Built-in | Plugin | Plugin |
| Streaming Support | ✓ | ✓ | ✓ |
| Semantic Caching | ✓ | ✕ | ✕ |
| Rate Limiting | ✓ | ✓ | ✓ |
| Load Balancing | ✓ | ✓ | ✓ |
| Service Mesh | ✕ | ✓ | ✓ |
| Kubernetes Native | Helm Chart | ✓ | ✓ |
| Self-Hosting | ✓ | ✓ | ✓ |
| Enterprise Support | ✓ | ✓ | ✓ |
| Learning Curve | Low | Medium-High | Medium |
Performance Benchmarks
Throughput (Requests/sec)
Latency Overhead (p99)
Memory Footprint
When to Use Each Gateway
Pure AI/LLM Applications
When your primary use case is managing LLM API calls across multiple providers. Native support for 100+ models, built-in cost tracking, and semantic caching make it the obvious choice for AI-focused workloads.
Enterprise Microservices
When you need to manage both traditional APIs and AI endpoints in a unified platform. Kong's mature ecosystem, service mesh, and enterprise support make it ideal for large-scale deployments.
Cloud-Native/Kubernetes
When deploying in Kubernetes environments with a need for dynamic configuration. APISIX's etcd-based storage and native K8s integration provide excellent cloud-native experience.
Rapid AI Development
When you need to quickly integrate multiple LLM providers without learning complex gateway configurations. LiteLLM's Python-based approach is familiar to ML engineers and data scientists.
Existing Kong Infrastructure
When your organization already uses Kong for API management. Adding LLM capabilities via plugins leverages existing expertise and infrastructure.
Performance-Critical APIs
When latency is paramount. APISIX's Nginx foundation provides some of the lowest overhead in the industry, making it ideal for high-performance requirements.
Configuration Examples
from litellm import completion
response = completion(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}],
api_key="your-key"
)
# Kong - Declarative YAML config
_format_version: "3.0"
services:
- name: "openai-service"
url: "https://api.openai.com"
routes:
- name: "openai-route"
paths: ["/v1"]
# APISIX - Route configuration
{
"uri": "/v1/*",
"upstream": {
"nodes": {
"api.openai.com:443": 1
}
}
}
🔗 Related Resources
Continue exploring: LLM Proxy for Enterprise | Tools Comparison | Self-Hosted Guide | Local Development Setup