AI Gateway Python

Build production-ready AI API gateways with Python, FastAPI, and modern asynchronous patterns. Complete guide covering everything from setup to deployment.

FastAPI 0.104+ Python 3.11+ Async/Await Type Hints Pydantic v2

FastAPI

Modern, fast web framework for building APIs with automatic OpenAPI documentation.

  • Automatic API documentation
  • Type hints & validation
  • Async request handling
  • Dependency injection
🏢

Django REST

Batteries-included framework with built-in admin, authentication, and ORM.

  • Built-in admin interface
  • Comprehensive ORM
  • Authentication system
  • Mature ecosystem
🔧

Flask + Extensions

Minimalist framework with flexible extension system for custom AI gateway needs.

  • Lightweight & flexible
  • Extensive extensions
  • Easy to customize
  • Simple deployment

Implementation Steps

1

Project Setup

Create virtual environment, install dependencies, and setup project structure.

2

API Gateway Core

Implement main gateway logic: request routing, authentication, rate limiting.

3

AI Provider Integration

Add support for OpenAI, Anthropic, Google AI, and other providers.

4

Middleware & Caching

Implement request/response middleware, caching strategies, and logging.

5

Testing & Deployment

Write tests, configure production settings, and deploy to your platform of choice.

Code Examples

Python 3.11+
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import httpx
from typing import List, Optional
import asyncio

# Pydantic models for request/response validation
class ChatMessage(BaseModel):
    role: str
    content: str

class ChatRequest(BaseModel):
    messages: List[ChatMessage]
    model: str = "gpt-4"
    temperature: float = 0.7

class ChatResponse(BaseModel):
    content: str
    model: str
    tokens_used: int

app = FastAPI(title="AI API Gateway")

# Dependency for rate limiting
async def rate_limiter(api_key: str):
    # Implement rate limiting logic
    pass

# AI Gateway endpoint
@app.post("/v1/chat/completions")
async def chat_completion(
    request: ChatRequest,
    api_key: str = Depends(rate_limiter)
) -> ChatResponse:
    # Route to appropriate AI provider
    if "gpt" in request.model:
        return await handle_openai_request(request)
    elif "claude" in request.model:
        return await handle_anthropic_request(request)
    
    raise HTTPException(400, "Unsupported model")

async def handle_openai_request(request: ChatRequest) -> ChatResponse:
    # Async HTTP client for OpenAI API
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "https://api.openai.com/v1/chat/completions",
            json=request.dict(),
            headers={
                "Authorization": f"Bearer {os.getenv('OPENAI_API_KEY')}"
            }
        )
        
        if response.status_code == 200:
            data = response.json()
            return ChatResponse(
                content=data["choices"][0]["message"]["content"],
                model=request.model,
                tokens_used=data["usage"]["total_tokens"]
            )
        
        raise HTTPException(response.status_code, response.text)

Essential Packages

F

FastAPI

Modern web framework

U

Uvicorn

ASGI server implementation

P

Pydantic

Data validation & settings

H

HTTPX

Async HTTP client

R

Redis

Caching & rate limiting

C

Celery

Background task queue

D

Docker

Containerization

K

K8s

Kubernetes deployment