Skip to main content

AI API Proxy Documentation

Complete API reference documentation for AI API Proxy implementations. Version 2026.3

🚀 Quick Start

Get started in minutes with these basic examples:

bash
# Install the SDK
npm install ai-api-proxy

# Make your first API call
import { AIProxyClient } from 'ai-api-proxy';

const client = new AIProxyClient({
    apiKey: 'your_api_key_here',
    baseURL: 'https://api.example.com/v1'
});

const response = await client.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello!' }]
});

Overview

The AI API Proxy provides a unified interface for accessing multiple AI providers through a single API. It handles authentication, rate limiting, request routing, and response formatting for you.

🔑 Authentication

All API requests require authentication using API keys. Include your API key in the request headers:

http
GET /v1/models
Authorization: Bearer YOUR_API_KEY
X-API-Key: YOUR_API_KEY

API keys can be obtained from your dashboard. Keep them secure and never expose them in client-side code.

âš¡ Rate Limiting

The API implements rate limiting to ensure fair usage. Limits are applied per API key and IP address.

Limit Type Requests Window Description
Standard 60 1 minute Basic rate limit for all endpoints
Chat Completions 20 1 minute Specific limit for chat endpoints
Embeddings 100 1 minute Higher limit for embeddings

API Endpoints

The API follows RESTful conventions and returns JSON responses. All endpoints are prefixed with /v1/.

POST /v1/chat/completions

Create chat completions with AI models. Supports streaming responses.

Request Parameters

Parameter Type Required Description
model string Required ID of the model to use (e.g., gpt-4, claude-3)
messages array Required Array of message objects
temperature number Optional Sampling temperature (0-2)
max_tokens number Optional Maximum tokens to generate
stream boolean Optional Enable streaming responses

Request Example

json
{
  "model": "gpt-4",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "Hello!"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1000,
  "stream": false
}

Response Example

200 OK Success Response
json
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}
POST /v1/embeddings

Generate embeddings for input text using AI models.

Request Parameters

Parameter Type Required Description
model string Required ID of the embedding model
input string/array Required Text to embed (string or array of strings)
encoding_format string Optional Format of embeddings (float, base64)

Response Example

200 OK Success Response
json
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        0.0023064255,
        -0.009327292,
        ... (1536 floats total for text-embedding-ada-002)
      ],
      "index": 0
    }
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}
GET /v1/models

List available AI models and their capabilities.

Response Example

200 OK Success Response
json
{
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1677610602,
      "owned_by": "openai",
      "permission": [...],
      "root": "gpt-4",
      "parent": null,
      "context_length": 8192,
      "capabilities": ["chat", "completion"]
    },
    {
      "id": "claude-3-opus",
      "object": "model",
      "created": 1680000000,
      "owned_by": "anthropic",
      "permission": [...],
      "root": "claude-3-opus",
      "parent": null,
      "context_length": 200000,
      "capabilities": ["chat", "reasoning"]
    }
  ]
}

Error Codes

The API uses conventional HTTP response codes to indicate success or failure of requests.

400 Bad Request

The request was malformed or missing required parameters.

401 Unauthorized

Invalid or missing authentication credentials.

403 Forbidden

Valid credentials but insufficient permissions.

429 Too Many Requests

Rate limit exceeded. Check rate limit headers.

500 Internal Server Error

Server-side error. Contact support if persistent.

502 Bad Gateway

Upstream provider unavailable. Retry with exponential backoff.

Error Response Format

json
{
  "error": {
    "message": "Invalid API key",
    "type": "invalid_request_error",
    "param": null,
    "code": "invalid_api_key"
  }
}

Best Practices

🔄 Retry Logic

Implement exponential backoff for retrying failed requests:

javascript
async function makeRequestWithRetry(url, options, maxRetries = 3) {
    for (let i = 0; i < maxRetries; i++) {
        try {
            const response = await fetch(url, options);
            if (response.ok) return response;
            
            // Exponential backoff
            const delay = Math.pow(2, i) * 1000;
            await new Promise(resolve => setTimeout(resolve, delay));
        } catch (error) {
            if (i === maxRetries - 1) throw error;
        }
    }
}

📊 Monitoring & Logging

  • Log all API requests and responses (excluding sensitive data)
  • Monitor rate limit usage using response headers
  • Track token usage for cost optimization
  • Implement alerting for error rate spikes