AI API Gateway Mock Responses

Accelerate development with mock AI responses that simulate real API behavior without backend dependencies

Mock responses in AI API gateways enable development and testing without actual AI backend dependencies. By intercepting requests and returning predefined or dynamically generated responses, teams can develop faster, test more thoroughly, and reduce costs associated with AI API calls during development cycles.

Static Mocks

Predefined responses returned for matching requests. Fast and predictable.

Dynamic Mocks

Generated responses based on request parameters and templates.

Recorded Mocks

Captured real responses replayed for consistent testing.

Smart Mocks

Context-aware responses simulating intelligent behavior.

Mock Response Architecture

The mock response system operates as a middleware layer in the API gateway, intercepting requests before they reach backend AI services. Configuration determines which requests receive mock responses and which pass through to real backends.

// Mock configuration for AI API gateway
mocking:
  enabled: true
  mode: hybrid  // mixed, mock-only, passthrough
  
  rules:
    - match:
        path: /v1/chat/completions
        method: POST
      response:
        type: dynamic
        template: chat-response
        delay: 500ms  // simulate latency
        
    - match:
        path: /v1/embeddings
      response:
        type: static
        file: ./mocks/embedding-response.json
        
  templates:
    chat-response:
      model: "gpt-3.5-turbo-mock"
      choices:
        - message:
            role: assistant
            content: "{{generate_response}}"
          finish_reason: stop

Static Mock Responses

Static mocks provide consistent, predictable responses for testing. Defined in JSON or YAML files, these responses enable reproducible test scenarios and documentation examples.

Use static mocks for unit testing where predictable responses are essential, documentation examples showing API response formats, error scenario testing simulating various error conditions, and demo environments requiring stable behavior.

Dynamic Mock Generation

Dynamic mocks generate responses based on request content, providing more realistic testing without real AI backends. Template engines create contextually appropriate responses.

Request-based generation - Echo or transform request content in responses
Template interpolation - Fill response templates with request-derived values
Conditional responses - Return different responses based on request parameters
Stateful mocking - Maintain conversation context across multiple requests

Dynamic Mock Tip

Implement smart token counting in mocks by analyzing request prompt length and generating appropriately sized responses. This helps test token-based billing logic without real API costs.

Recording and Playback

Recorded mocks capture real API responses during initial development or integration testing sessions. Subsequent requests replay these recordings, ensuring consistency while reducing costs.

Recording Strategies

First-run recording captures responses on initial request and replays for subsequent identical requests. Scheduled recording refreshes captures periodically to reflect backend changes. Manual recording explicitly records specific scenarios for test fixtures.

Environment-Specific Mocking

Different environments have different mocking requirements. Development environments may use mocks extensively, while staging environments mix mocks with real backends for integration testing.

Development uses mock-only mode for fast iteration without API costs. Testing combines mocks for edge cases with real APIs for integration validation. Staging minimizes mocking for production-like validation. Production disables mocking except for specific test accounts.

Testing with Mock Responses

AI API gateway mock responses enable comprehensive testing scenarios that would be expensive or impossible with real AI backends. Design test suites that leverage mocks for thorough validation.

Test rate limiting logic by mocking rate limit errors. Validate timeout handling with delayed mock responses. Simulate model failures through error mocks. Test token counting with predictable mock responses. Verify retry logic using intermittent failure mocks.

Mock Server Configuration

Configure the mock server through gateway configuration files or dynamic API endpoints. Flexible configuration enables rapid iteration during development.

# Enable/disable mocking dynamically
POST /admin/mocking/enable
{
  "mode": "hybrid",
  "default_behavior": "passthrough"
}

# Add mock rule dynamically
POST /admin/mocking/rules
{
  "match": {
    "path": "/v1/completions",
    "header": {"X-Test-Scenario": "error-rate-limit"}
  },
  "response": {
    "status": 429,
    "body": {
      "error": {
        "message": "Rate limit exceeded",
        "type": "rate_limit_error"
      }
    }
  }
}