AI API Proxy Health Checks

Understanding Health Checks

Health checks form the foundation of reliable distributed systems. For AI API proxies, they serve a critical role: detecting provider failures before user requests fail, enabling automatic failover to backup providers, and maintaining overall system reliability. Without robust health checking, your proxy becomes a single point of failure that can amplify provider issues into complete service outages.

Effective health checking combines multiple techniques to achieve comprehensive coverage. Active probing tests provider endpoints explicitly, while passive monitoring observes actual request behavior. Circuit breakers prevent cascading failures, and automated failover ensures continuity when primary providers degrade.

🔍

Active Probing

Periodic requests to provider endpoints to verify availability and measure latency.

👁️

Passive Monitoring

Observe real request outcomes to detect issues active checks might miss.

⚡

Circuit Breaker

Automatically stop sending requests to failing providers to prevent cascading failures.

Implementation Strategies

Implementing health checks requires balancing thoroughness with efficiency. Too aggressive checking wastes resources and may trigger provider rate limits. Too passive checking risks missing issues until users are impacted. The optimal strategy combines multiple approaches with appropriate intervals.

Active Health Check Configuration

Active health checks send periodic requests to provider endpoints to verify they respond correctly. These checks should test the full request path, including authentication, to catch configuration issues that simpler endpoint pings might miss.

yaml - Health Check Configuration

health_checks:
  active:
    enabled: true
    interval: 30s  # Check every 30 seconds
    timeout: 10s  # Max wait for response
    unhealthy_threshold: 3  # Failures before marking unhealthy
    healthy_threshold: 2  # Successes before marking healthy
    
  providers:
    openai:
      endpoint: https://api.openai.com/v1/models
      method: GET
      headers:
        Authorization: "Bearer ${OPENAI_API_KEY}"
      expected_status: 200
      
    anthropic:
      endpoint: https://api.anthropic.com/v1/complete
      method: POST
      body:
        prompt: "test"
        max_tokens: 1
        model: "claude-3-sonnet"
                    

Passive Monitoring Implementation

Passive monitoring observes actual request outcomes without generating additional traffic. This approach catches issues that active checks miss, such as model-specific errors or authentication problems that only manifest under specific request patterns.

python - Passive Health Monitor

class PassiveHealthMonitor:
    def __init__(self):
        self.error_rates = defaultdict(list)
        self.latencies = defaultdict(list)
        self.window_size = 100  # Track last 100 requests
    
    def record_request(self, provider, success, latency):
        """Record request outcome for health calculation"""
        self.error_rates[provider].append(0 if success else 1)
        self.latencies[provider].append(latency)
        
        # Maintain window size
        if len(self.error_rates[provider]) > self.window_size:
            self.error_rates[provider].pop(0)
            self.latencies[provider].pop(0)
    
    def is_healthy(self, provider):
        """Determine health based on recent requests"""
        errors = self.error_rates[provider]
        if not errors:
            return True
        
        error_rate = sum(errors) / len(errors)
        avg_latency = sum(self.latencies[provider]) / len(self.latencies[provider])
        
        # Health criteria
        return error_rate < 0.05 and avg_latency < 5.0  # <5% errors, <5s latency
                    

Implementation Tip

Combine active and passive health checks for comprehensive coverage. Active checks verify basic connectivity, while passive monitoring catches nuanced issues that only appear under real traffic patterns.

Circuit Breaker Pattern

Circuit breakers prevent cascading failures by automatically stopping requests to unhealthy providers. When a provider fails repeatedly, the circuit breaker "opens," immediately rejecting requests without attempting the actual call. After a timeout period, it enters a "half-open" state to test if the provider has recovered.

Circuit Breaker States

Closed: Normal operation. Requests flow through to the provider. Failures increment a counter.

Open: Provider is unhealthy. All requests fail immediately without calling the provider. Prevents wasted resources and allows provider to recover.

Half-Open: Testing recovery. Limited requests pass through to verify provider health. If successful, circuit closes; if failures continue, circuit reopens.

python - Circuit Breaker Implementation

from enum import Enum
from datetime import datetime, timedelta

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timedelta(seconds=timeout)
        self.last_failure_time = None
    
    def call(self, func, *args, **kwargs):
        """Execute function with circuit breaker protection"""
        if self.state == CircuitState.OPEN:
            if datetime.now() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            self.record_success()
            return result
        except Exception as e:
            self.record_failure()
            raise e
    
    def record_success(self):
        """Reset on successful call"""
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def record_failure(self):
        """Track failures and open circuit if threshold exceeded"""
        self.failure_count += 1
        self.last_failure_time = datetime.now()
        
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
                    

Best Practices

Effective health checking requires more than technical implementation. Operational practices, monitoring integration, and team processes all contribute to achieving genuine reliability improvements.

Health Check Frequency

Balance between detection speed and resource consumption. For most AI providers, checking every 30 seconds provides adequate coverage without excessive overhead. Increase frequency for critical production systems where faster detection justifies additional costs.

Failure Definition

Define failure comprehensively. HTTP errors are obvious failures, but slow responses may also indicate degraded performance. Consider latency thresholds that trigger warnings before actual failures occur, enabling proactive routing adjustments.

Alerting Strategy

Configure alerts for health state transitions, not just failures. A provider moving from healthy to degraded warrants investigation before it becomes unhealthy. Trend-based alerting catches gradual degradation that individual checks might miss.

Key Principle

Health checks should test the full request path, including authentication and model availability. Simple endpoint pings may return success even when actual API calls would fail due to rate limits or configuration issues.