Understanding Health Checks
Health checks form the foundation of reliable distributed systems. For AI API proxies, they serve a critical role: detecting provider failures before user requests fail, enabling automatic failover to backup providers, and maintaining overall system reliability. Without robust health checking, your proxy becomes a single point of failure that can amplify provider issues into complete service outages.
Effective health checking combines multiple techniques to achieve comprehensive coverage. Active probing tests provider endpoints explicitly, while passive monitoring observes actual request behavior. Circuit breakers prevent cascading failures, and automated failover ensures continuity when primary providers degrade.
Active Probing
Periodic requests to provider endpoints to verify availability and measure latency.
Passive Monitoring
Observe real request outcomes to detect issues active checks might miss.
Circuit Breaker
Automatically stop sending requests to failing providers to prevent cascading failures.
Implementation Strategies
Implementing health checks requires balancing thoroughness with efficiency. Too aggressive checking wastes resources and may trigger provider rate limits. Too passive checking risks missing issues until users are impacted. The optimal strategy combines multiple approaches with appropriate intervals.
Active Health Check Configuration
Active health checks send periodic requests to provider endpoints to verify they respond correctly. These checks should test the full request path, including authentication, to catch configuration issues that simpler endpoint pings might miss.
health_checks: active: enabled: true interval: 30s # Check every 30 seconds timeout: 10s # Max wait for response unhealthy_threshold: 3 # Failures before marking unhealthy healthy_threshold: 2 # Successes before marking healthy providers: openai: endpoint: https://api.openai.com/v1/models method: GET headers: Authorization: "Bearer ${OPENAI_API_KEY}" expected_status: 200 anthropic: endpoint: https://api.anthropic.com/v1/complete method: POST body: prompt: "test" max_tokens: 1 model: "claude-3-sonnet"
Passive Monitoring Implementation
Passive monitoring observes actual request outcomes without generating additional traffic. This approach catches issues that active checks miss, such as model-specific errors or authentication problems that only manifest under specific request patterns.
class PassiveHealthMonitor: def __init__(self): self.error_rates = defaultdict(list) self.latencies = defaultdict(list) self.window_size = 100 # Track last 100 requests def record_request(self, provider, success, latency): """Record request outcome for health calculation""" self.error_rates[provider].append(0 if success else 1) self.latencies[provider].append(latency) # Maintain window size if len(self.error_rates[provider]) > self.window_size: self.error_rates[provider].pop(0) self.latencies[provider].pop(0) def is_healthy(self, provider): """Determine health based on recent requests""" errors = self.error_rates[provider] if not errors: return True error_rate = sum(errors) / len(errors) avg_latency = sum(self.latencies[provider]) / len(self.latencies[provider]) # Health criteria return error_rate < 0.05 and avg_latency < 5.0 # <5% errors, <5s latency
Implementation Tip
Combine active and passive health checks for comprehensive coverage. Active checks verify basic connectivity, while passive monitoring catches nuanced issues that only appear under real traffic patterns.
Circuit Breaker Pattern
Circuit breakers prevent cascading failures by automatically stopping requests to unhealthy providers. When a provider fails repeatedly, the circuit breaker "opens," immediately rejecting requests without attempting the actual call. After a timeout period, it enters a "half-open" state to test if the provider has recovered.
Circuit Breaker States
Closed: Normal operation. Requests flow through to the provider. Failures increment a counter.
Open: Provider is unhealthy. All requests fail immediately without calling the provider. Prevents wasted resources and allows provider to recover.
Half-Open: Testing recovery. Limited requests pass through to verify provider health. If successful, circuit closes; if failures continue, circuit reopens.
from enum import Enum from datetime import datetime, timedelta class CircuitState(Enum): CLOSED = "closed" OPEN = "open" HALF_OPEN = "half_open" class CircuitBreaker: def __init__(self, failure_threshold=5, timeout=60): self.state = CircuitState.CLOSED self.failure_count = 0 self.failure_threshold = failure_threshold self.timeout = timedelta(seconds=timeout) self.last_failure_time = None def call(self, func, *args, **kwargs): """Execute function with circuit breaker protection""" if self.state == CircuitState.OPEN: if datetime.now() - self.last_failure_time > self.timeout: self.state = CircuitState.HALF_OPEN else: raise Exception("Circuit breaker is OPEN") try: result = func(*args, **kwargs) self.record_success() return result except Exception as e: self.record_failure() raise e def record_success(self): """Reset on successful call""" self.failure_count = 0 self.state = CircuitState.CLOSED def record_failure(self): """Track failures and open circuit if threshold exceeded""" self.failure_count += 1 self.last_failure_time = datetime.now() if self.failure_count >= self.failure_threshold: self.state = CircuitState.OPEN
Best Practices
Effective health checking requires more than technical implementation. Operational practices, monitoring integration, and team processes all contribute to achieving genuine reliability improvements.
Health Check Frequency
Balance between detection speed and resource consumption. For most AI providers, checking every 30 seconds provides adequate coverage without excessive overhead. Increase frequency for critical production systems where faster detection justifies additional costs.
Failure Definition
Define failure comprehensively. HTTP errors are obvious failures, but slow responses may also indicate degraded performance. Consider latency thresholds that trigger warnings before actual failures occur, enabling proactive routing adjustments.
Alerting Strategy
Configure alerts for health state transitions, not just failures. A provider moving from healthy to degraded warrants investigation before it becomes unhealthy. Trend-based alerting catches gradual degradation that individual checks might miss.
Key Principle
Health checks should test the full request path, including authentication and model availability. Simple endpoint pings may return success even when actual API calls would fail due to rate limits or configuration issues.
Partner Resources
AI API Gateway Observability
Comprehensive monitoring strategies for AI systems.
API Gateway Proxy Metrics
Track essential KPIs for production AI operations.
OpenAI API Gateway Alerts
Design effective alerting for AI provider issues.
AI API Gateway Containerization
Deploy health-checked gateways in containers.